-
Notifications
You must be signed in to change notification settings - Fork 73
{2025.06}[SYSTEM] CUDA 12.6.0,12.8.0, cuDNN 9.5.0.50,9.10.1.4 #1351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…if we can get it to work
|
Let's first try this for a single CPU arch for each supported CC. Native builds: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 Cross compiles: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
bot: help |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
So... the updated hooks are not picked up, because they were used from the repository ( bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Didn't work, for some reason. Adding more debugging output: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Forgot to push the change to add debugging output. Retry: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Ok, reloading the EasyBuild module was overwriting the change. Retrying, by setting |
|
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Lets try the others again too, to make sure that (still) works: Native builds: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 Cross compiles: bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
The last failure was because of: I.e. this is where we're trying to install the CUDA SDKs in |
|
Trying again with EESSI/software-layer-scripts@8292fa3 this fix bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws on:arch=zen4 for:arch=x86_64/amd/zen4,accel=nvidia/cc100 |
|
New job on instance
|
|
New job on instance
|
Many things still need to be done... The software-layer-scripts PR should make sure to
--module-onlyif they target CC100 or above9.5.0comes, I think, only with9.0device code, not9.0a. Thus, we should change the requested CC to9.0for that particular software name & version. For cuDNN 9.10.1.4, I think9.0ais supported, but10.0fis not and it should be changed to10.0. I'd prefer to make those changes in hooks to avoid having to open multiple different software-layer PR, each with customoptionsfor the build. Added advantage is that by doing it in the hooks, it also fixes things forEESSI-extend-based installations.Edit 08-01:
cuDNN-9.5.0.50indeed contains device code for7.0,8.0and9.0, but not for9.0a, which causes the sanity check to fail. So we should make a conversion from 9.0a to 9.0 (in a hook?) for this version.cuDNN-9.10.1.4contains device code for7.0,8.0,9.0a,10.0,12.0, but not for10.0fand12.0f, so needs stripping of the suffix for those as well.This PR should replace #1278 , #1286 and #1287